Skip to content

proc: implement /proc/[pid]/setgroups#13147

Open
shayonj wants to merge 1 commit into
google:masterfrom
shayonj:setgroups-impl
Open

proc: implement /proc/[pid]/setgroups#13147
shayonj wants to merge 1 commit into
google:masterfrom
shayonj:setgroups-impl

Conversation

@shayonj
Copy link
Copy Markdown
Contributor

@shayonj shayonj commented May 12, 2026

Without /proc/[pid]/setgroups the canonical rootless user-namespace setup
cannot run. The expected sequence starts by writing "deny" to
/proc/self/setgroups so that an unprivileged process can then write a
non-trivial gid_map, and that file did not exist in gVisor. Tools that
take this path, like buildah or podman, fail with ENOENT on
/proc/[pid]/setgroups before they can do any work.

I followed kernel/user_namespace.c. Each UserNamespace now carries a
setgroupsAllowed bit that is initialized true and inherited from the
parent at creation. Writing "deny" succeeds while gid_map is still empty
and clears the bit, while writing "allow" succeeds only if the bit is
still set, which preserves the one-way USERNS_SETGROUPS_ALLOWED
transition. The unprivileged branch of SetGIDMap now requires the bit to
be clear so that the behavior matches new_idmap_permitted, and
setgroups(2) is gated by the conjunction of the bit and a non-empty
gid_map, mirroring userns_may_setgroups.

Tests in test/syscalls/linux/proc_pid_uid_gid_map.cc cover the file's
read and write semantics, the one-way state transition, child-namespace
inheritance, the syscall gating before and after deny, and the gid_map
interaction. They pass on both runsc and native Linux.

Closes #1235 and removes the b/27454212 TODO that has sat in user_namespace.go since the initial commit.

Copy link
Copy Markdown
Collaborator

@EtiennePerot EtiennePerot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

copybara-service Bot pushed a commit that referenced this pull request May 16, 2026
Without /proc/[pid]/setgroups the canonical rootless user-namespace setup
cannot run. The expected sequence starts by writing "deny" to
/proc/self/setgroups so that an unprivileged process can then write a
non-trivial gid_map, and that file did not exist in gVisor. Tools that
take this path, like buildah or podman, fail with ENOENT on
/proc/[pid]/setgroups before they can do any work.

I followed kernel/user_namespace.c. Each UserNamespace now carries a
setgroupsAllowed bit that is initialized true and inherited from the
parent at creation. Writing "deny" succeeds while gid_map is still empty
and clears the bit, while writing "allow" succeeds only if the bit is
still set, which preserves the one-way USERNS_SETGROUPS_ALLOWED
transition. The unprivileged branch of SetGIDMap now requires the bit to
be clear so that the behavior matches new_idmap_permitted, and
setgroups(2) is gated by the conjunction of the bit and a non-empty
gid_map, mirroring userns_may_setgroups.

Tests in test/syscalls/linux/proc_pid_uid_gid_map.cc cover the file's
read and write semantics, the one-way state transition, child-namespace
inheritance, the syscall gating before and after deny, and the gid_map
interaction. They pass on both runsc and native Linux.

Closes #1235 and removes the b/27454212 TODO that has sat in user_namespace.go since the initial commit.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13147 from shayonj:setgroups-impl 43f9528
PiperOrigin-RevId: 916335886
copybara-service Bot pushed a commit that referenced this pull request May 16, 2026
Without /proc/[pid]/setgroups the canonical rootless user-namespace setup
cannot run. The expected sequence starts by writing "deny" to
/proc/self/setgroups so that an unprivileged process can then write a
non-trivial gid_map, and that file did not exist in gVisor. Tools that
take this path, like buildah or podman, fail with ENOENT on
/proc/[pid]/setgroups before they can do any work.

I followed kernel/user_namespace.c. Each UserNamespace now carries a
setgroupsAllowed bit that is initialized true and inherited from the
parent at creation. Writing "deny" succeeds while gid_map is still empty
and clears the bit, while writing "allow" succeeds only if the bit is
still set, which preserves the one-way USERNS_SETGROUPS_ALLOWED
transition. The unprivileged branch of SetGIDMap now requires the bit to
be clear so that the behavior matches new_idmap_permitted, and
setgroups(2) is gated by the conjunction of the bit and a non-empty
gid_map, mirroring userns_may_setgroups.

Tests in test/syscalls/linux/proc_pid_uid_gid_map.cc cover the file's
read and write semantics, the one-way state transition, child-namespace
inheritance, the syscall gating before and after deny, and the gid_map
interaction. They pass on both runsc and native Linux.

Closes #1235 and removes the b/27454212 TODO that has sat in user_namespace.go since the initial commit.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13147 from shayonj:setgroups-impl 43f9528
PiperOrigin-RevId: 916335886
copybara-service Bot pushed a commit that referenced this pull request May 16, 2026
Without /proc/[pid]/setgroups the canonical rootless user-namespace setup
cannot run. The expected sequence starts by writing "deny" to
/proc/self/setgroups so that an unprivileged process can then write a
non-trivial gid_map, and that file did not exist in gVisor. Tools that
take this path, like buildah or podman, fail with ENOENT on
/proc/[pid]/setgroups before they can do any work.

I followed kernel/user_namespace.c. Each UserNamespace now carries a
setgroupsAllowed bit that is initialized true and inherited from the
parent at creation. Writing "deny" succeeds while gid_map is still empty
and clears the bit, while writing "allow" succeeds only if the bit is
still set, which preserves the one-way USERNS_SETGROUPS_ALLOWED
transition. The unprivileged branch of SetGIDMap now requires the bit to
be clear so that the behavior matches new_idmap_permitted, and
setgroups(2) is gated by the conjunction of the bit and a non-empty
gid_map, mirroring userns_may_setgroups.

Tests in test/syscalls/linux/proc_pid_uid_gid_map.cc cover the file's
read and write semantics, the one-way state transition, child-namespace
inheritance, the syscall gating before and after deny, and the gid_map
interaction. They pass on both runsc and native Linux.

Closes #1235 and removes the b/27454212 TODO that has sat in user_namespace.go since the initial commit.

FUTURE_COPYBARA_INTEGRATE_REVIEW=#13147 from shayonj:setgroups-impl 43f9528
PiperOrigin-RevId: 916335886
@ayushr2
Copy link
Copy Markdown
Collaborator

ayushr2 commented May 17, 2026

The newly added tests are failing: https://buildkite.com/gvisor/pipeline/builds/42455

Without /proc/[pid]/setgroups the canonical rootless user-namespace setup
cannot run. The expected sequence starts by writing "deny" to
/proc/self/setgroups so that an unprivileged process can then write a
non-trivial gid_map, and that file did not exist in gVisor. Tools that
take this path, like buildah or podman, fail with ENOENT on
/proc/[pid]/setgroups before they can do any work.

I followed kernel/user_namespace.c. Each UserNamespace now carries a
setgroupsAllowed bit that is initialized true and inherited from the
parent at creation. Writing "deny" succeeds while gid_map is still empty
and clears the bit, while writing "allow" succeeds only if the bit is
still set, which preserves the one-way USERNS_SETGROUPS_ALLOWED
transition. The unprivileged branch of SetGIDMap now requires the bit to
be clear so that the behavior matches new_idmap_permitted, and
setgroups(2) is gated by the conjunction of the bit and a non-empty
gid_map, mirroring userns_may_setgroups.

Tests in test/syscalls/linux/proc_pid_uid_gid_map.cc cover the file's
read and write semantics, the one-way state transition, child-namespace
inheritance, the syscall gating before and after deny, and the gid_map
interaction. They pass on both runsc and native Linux.

Closes google#1235 and removes the b/27454212 TODO that has sat in
user_namespace.go since the initial commit.
@shayonj
Copy link
Copy Markdown
Contributor Author

shayonj commented May 17, 2026

Thanks for the link! Looks like Linux inherits the parent namespace’s setgroups state, so the test can’t assume allow, and the nested user-namespace case also needs UID/GID maps written before the second unshare(CLONE_NEWUSER). I updated the tests to match that behavior and verified the failing native/runsc shard cases locally. The failures were in the test expectations not the /proc/setgroups implementation, so should pass on the next run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support disabling set_groups(2) for user namespaces

3 participants